-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Graceful termination of the operator #2662
RFC: Graceful termination of the operator #2662
Conversation
8da2534
to
a9da17e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good solution. Would it make sense to add a customisable timeout?
Unfortunately, in testing of this it proves it's not a 100% solution. The operator waits as expected, but helm happily goes ahead and deletes the clusterrole and binding, preventing the operator from actually doing its work to clean up. I still need to try helm with |
Looks like even with foreground deletion, tigera-operator loses RBAC for the cluster before it can clean up after itself. So, we'll need to make sure the ClusterRole and Binding don't get deleted until the operator deployment itself is gone. I can't seem to find a way to do that in helm itself, so we probably need to resort to even more finalizers 😭 |
Probably what we should have done from the beginning is manage RBAC outside of helm (like the namespace, and CRDs). But that starts to mean a lot of yaml is managed outside of the chart. |
@caseydavenport would finalizers work in this context and if so what would be the increased RBAC surface area? Alternatively a pre delete Helm hook to delete the installation should work. |
Right, so this would be something along the lines of executing a Job (+ClusterRole +Binding) to delete the Installation prior to actually uninstalling the chart. That seems reasonable, I'll give that a try next. |
You can probably use the RBAC from the chart as a pre-delete hook would be run before they're removed. |
fac549f
to
02c62d4
Compare
It wasn't quite as trivial as I thought it might be, but I did get a pre-delete hook working for a basic Calico + apiserver install, in combination with this PR: projectcalico/calico#7859 One thing with this is that helm won't proceed cleaning up its CRs until the pre-delete hook is complete, and we don't want the pre-delete hook to completed until we're sure the Installation object has been dealt with. So, my solution is to:
|
Perhaps a smarter approach would be to have the apiserver controller put its own finalizer on the Installation (and any other controllers that depend on the Installation existing in order to perform their cleanup) |
|
||
// We need to wait for termination to complete. We can do this by checking if the Installation | ||
// resource has been cleaned up or not. | ||
to := 60 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set terminationGracePeriodSeconds
to 60s to match this since the default is 30s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One question about a change we might want to make to the operator manifest.
02c62d4
to
a7b2434
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Any news about this issue? |
@joaobettu hopefully you've seen the the messages on the issues this is addressing. If you're looking for some info specific to this PR please be more specific about what you're looking for. |
Description
Allow for graceful termination of the operator to occur when both the operator and the Installation resource have been deleted simultaneously.
This was causing issues such as the following:
I am not sure this is a perfect solution, but in very limited testing it seems to be doing the trick for me!
For PR author
make gen-files
make gen-versions
For PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bug
if this is a bugfix.kind/enhancement
if this is a a new feature.enterprise
if this PR applies to Calico Enterprise only.